Current Issue : October - December Volume : 2017 Issue Number : 4 Articles : 5 Articles
Background: Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The\nclassification and annotation of these genomes constitute important assets in the discovery of genomic variability,\ntaxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific\nwell-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and\naccurate tools for classifying and typing newly sequenced strains of diverse virus families.\nResults: Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR\nis inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It\nsimulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two\nmetrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR\nfor the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human\nimmunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species,\nHBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance\ncompared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments.\nConclusion: The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate\nlarge scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine\nlearning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca....
Background: Reconstructing gene regulatory networks (GRNs) from expression data plays an important role in\nunderstanding the fundamental cellular processes and revealing the underlying relations among genes. Although\nmany algorithms have been proposed to reconstruct GRNs, more rapid and efficient methods which can handle\nlarge-scale problems still need to be developed. The process of reconstructing GRNs can be formulated as an\noptimization problem, which is actually reconstructing GRNs from time series data, and the reconstructed GRNs\nhave good ability to simulate the observed time series. This is a typical big optimization problem, since the number\nof variables needs to be optimized increases quadratically with the scale of GRNs, resulting an exponential increase\nin the number of candidate solutions. Thus, there is a legitimate need to devise methods capable of automatically\nreconstructing large-scale GRNs.\nResults: In this paper, we use fuzzy cognitive maps (FCMs) to model GRNs, in which each node of FCMs represent\na single gene. However, most of the current training algorithms for FCMs are only able to train FCMs with dozens\nof nodes. Here, a new evolutionary algorithm is proposed to train FCMs, which combines a dynamical multi-agent\ngenetic algorithm (dMAGA) with the decomposition-based model, and termed as dMAGA-FCMD, which is able to\ndeal with large-scale FCMs with up to 500 nodes. Both large-scale synthetic FCMs and the benchmark DREAM4 for\nreconstructing biological GRNs are used in the experiments to validate the performance of dMAGA-FCMD.\nConclusions: The dMAGA-FCMD is compared with the other four algorithms which are all state-of-the-art FCM\ntraining algorithms, and the results show that the dMAGA-FCMD performs the best. In addition, the experimental\nresults on FCMs with 500 nodes and DREAM4 project demonstrate that dMAGA-FCMD is capable of effectively and\ncomputationally efficiently training large-scale FCMs and GRNs....
Background: DNA Sonification refers to the use of an auditory display to convey the information content of DNA\nsequence data. Six sonification algorithms are presented that each produce an auditory display. These algorithms\nare logically designed from the simple through to the more complex. Three of these parse individual nucleotides,\nnucleotide pairs or codons into musical notes to give rise to 4, 16 or 64 notes, respectively. Codons may also be\nparsed degenerately into 20 notes with respect to the genetic code. Lastly nucleotide pairs can be parsed as two\nseparate frames or codons can be parsed as three reading frames giving rise to multiple streams of audio.\nResults: The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to\nproduce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop\ncodons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected.\nUsing these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be distinguished\nfrom one another. These sonification tools are available through a webpage interface in which an input DNA sequence\ncan be processed in real time to produce an auditory display playable directly within the browser. The potential of this\napproach as an analytical tool is discussed with reference to auditory displays derived from test sequences including\nsimple nucleotide sequences, repetitive DNA sequences and coding or non-coding genes.\nConclusion: This study presents a proof-of-concept that some properties of a DNA sequence can be identified through\nsonification alone and argues for their inclusion within the toolkit of DNA sequence browsers as an adjunct to existing\nvisual and analytical tools....
Background: Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete\npicture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist\napproaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between\nspecific genomic regionsââ?¬â?the promoters and enhancers, neglecting other possibilities, for instance, the so-called\nstructural interactions involving intervening chromatin.\nResults: We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to\npredict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific\nsupport vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The\nmethod shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC)\ncurve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did\nnot have sufficient candidate interaction partners for model training, we employed multitask learning to share\nknowledge between models of different loci. In this scenario, across the three cell lines, the method attained an\naverage performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding\nprediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on\naverage. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence\nsignals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence\nsignals suggests a potential general role of short tandem repeat sequences in genome organization.\nConclusions: We demonstrated how our approach can 1) provide insights into sequence features of locus-specific\ninteraction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat\nsequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role\nin genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level,\nchromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study\nregions omitted from existing prediction approaches using various information sources (e.g., epigenetic information);\nand (c) improve methods that predict the 3D structure of the chromatin....
Principal Component Analysis (PCA) as a tool for dimensionality reduction is widely used in many areas. In the area of\nbioinformatics, each involved variable corresponds to a specific gene. In order to improve the robustness of PCA-based method,\nthis paper proposes a novel graph-Laplacian PCA algorithm by adopting...
Loading....